Learning Goals:

  1. Explain the concept of causality within the potential outcomes framework
  2. Explain how randomized experiments can be used to generate evidence in support of causal claims, thereby solving the fundamental problem of causal inference


Review of Enos Trains Study:

  1. What was the hypothesis that Enos wanted to test?
  2. If Enos already knew that people living in areas with higher concentrations of immigrants were more hostile to immigration, why did he decide to run an experiment in the first place?


Establishing Causal Claims

A Working Example

Suppose you wanted to measure the effectiveness of Metababoost™, a new weight-loss drink that has become very popular recently.

You conduct a large tracking survey where you are able to re-interview the same people at various points over 12 months.

You are interested in how BMI changes amongst people who regularly consume Metababoost™, compared against those who do not.

After one year, you find that people who regularly consume Metababoost™ had a larger drop in BMI than people in the comparison group. The difference is 1.5 kg/m2, on average.

  1. Can you conclude that Metababoost™ causes weight loss? Why or why not?
  2. What does it mean to say that Metababoost™ causes the average person to lose 1.5 kg/m2?

Potential Outcomes

Illustration of potential outcomes for the change in BMI, depending on whether or not an individual consumes Metababoost™
BMI Change if No Metababoost™ BMI Change if Metababoost™ Difference
Alex 2 0 -2
Bonnie 1 0 -1
Colin 0 0 0
Danielle 0 -3 -3
Earl -3 -6 -3
Fiona -4 -4 0
Gaston -6 -7 -1
Hermine -8 -10 -2
AVERAGE -2.25 -3.75 -1.5

To say that Metababoost™ causes a BMI drop of 1.5 kg/m2, we mean that in an imaginary counterfactual world where the people who actually drank Metababoost™ instead did not drink it, their BMI would be 1.5 kg/m2 higher, on average.

Similarly, we could say that in a counterfactual world where the people who actually didn’t drink Metababoost™ had instead consumed it regularly, their BMI would be 1.5 kg/m2 lower, on average.

This is the idea behind causation within the potential outcomes framework.

Small group exercise: Think about a causal claim that you would be interested in evaluating (this doesn’t have to be related to ethnic diversity). How would you state this causal claim in the counterfactual outcomes framework?


If we could observe everyone’s potential outcome (as in the above table), then finding evidence of causation is easy!

Of course, we cannot observe these counterfactual worlds. In the real world, people either took Metababoost™ or they didn’t. In other words, our real-world data look something like this:

Illustration of observed change in BMI for people who do and don’t drink Metababoost™
No Metababoost™ Metababoost™ Difference
Alex 2 ?
Bonnie 0 ?
Colin 0 ?
Danielle 0 ?
Earl -6 ?
Fiona -4 ?
Gaston -7 ?
Hermine -8 ?

Since we do not observe counterfatual outcomes, how can we estimate a causal effect?

As it turns out, we cannot estimate a separate treatment effect for each individual (why not?).

But can estimate the average treatment effect across all individuals…


Randomization and Expectation

Imagine you had a box with a large number of tickets inside. On each ticket is written a value from 0 to 50. You task is estimate the average value of the tickets in the box. You randomly choose 100 tickets from the box, and the average on these tickets is 35.

What is your best estimate for average value of the tickets in the whole box?


Returning to our working example, imagine you had a population of 1000 people. You randomly assign 500 of them to drink Metababoost™ for a year (and you make sure they actually do it). Let’s call these people the treatment group (T), and let’s call the change in BMI you measure for these people their treatment outcomes.

The other 500 people constitute the control group (C) and you make sure that they do not consume any Metababoost™ during the year. Their change in BMI constitute the control outcomes.

Group Discussion:


  1. Recall that you randomly selected 500 people for the treatment group. But are the people in the control group also randomly assigned?
  2. Suppose that the average of the observed treatment outcomes = -2. What is your estimate of the (observed and unobserved) potential treatment outcomes?
  3. Similarly, suppose that the average of the observed control outcomes = +1. What is your estimate of the (observed and unobserved) potential control outcomes?
  4. What if we had only chosen 100 people at random for the treatment group, and 900 for the control group? (How) would our answers change?

Key Takeaway:


Just as you can use the value on your 100 randomly-drawn to estimate the value of all of the tickets in the box, you can think of the observed treatment outcomes as a random sample of all potential treatment outcomes. Thus, the average of these observed treatment outcomes forms your estimate of the average of all potential treatment outcomes.

Similarly, the average of your observed control outcomes forms your estimate of the average of all potential control outcomes.

Illustration of observed change in BMI for people randomly assigned to drink Metababoost™
No Metababoost™ Metababoost™ Difference
Subject1 2 ?
Subject2 -5 ?
Subject3 0 ?
Subject4 -3 ?
Subject999 -2 ?
Subject1000 -1 ?
AVERAGE -2.25 -3.75 -1.5

In the Table above, even though we cannot observe all of the potential outcomes, we can nonetheless estimate their averages.

Taking the difference between these two estimates yields your average treatment effect (ATE), or the average causal effect of Metababoost™.

NOTE: this only works because you have randomly allocated people into T and C.



Randomization and the Fundamental Problem of Causal Inference

Recall that the fundamental problem of causal inference arises because people may self-select into T or C.

To return to our working example, the people who choose to drink Metababoost™ may be different in terms of their potential outcomes from the people who choose not to drink it. For instance, suppose that people (e.g. Hermine) who cared a lot about diet and exercise also bought Metababoost™, while those (e.g. Colin) who don’t care so much about fitness ignored the whole Metababoost™ fad.

For example, allowing people to self-select into treatment might yield the following:

Illustration of observed change in BMI depending on whether people choose to drink Metababoost™
BMI Change if No Metababoost™ BMI Change if Metababoost™ Difference
Alex 2 ?
Bonnie 1 ?
Colin 0 ?
Danielle 0 ?
Earl -6 ?
Fiona -4 ?
Gaston -7 ?
Hermine -10 ?
AVERAGE 0.75 -6.75 -7.5

Comparing the above table to the full schedule of potential outcomes, we can see that people in the treatment group would have lost a lot of weight anyways, even if they didn’t drink Metababoost™, while people in the control group would not have lost very much weight, even if they did buy Metababoost™.

But since we only observe treatment outcomes for fitness freaks and control outcomes for couch potatoes, we overestimate the average effect of Metababoost™.


More broadly, if we allow people to self-select in T and C, we can no longer consider the observed treatment/control outcomes as a random sample of all potential treatment/control outcomes. Thus, our basis for assessing causality falls apart.

In the absence of random assignment, estimates of the ATE may be biased – that is, if we reran this experiment a large of times, our estimates would tend to be either too large or too small.

Here is another way of thinking about this problem: suppose there is a third variable – motivation to get fit – which is correlated with both:

  1. change in BMI and
  2. whether or not one buys Metababoost™ (i.e. whether an individual self-selects into T).

Here we can say that motivation confounds the statistical relationship between drinking Metababoost™ and change in BMI. However, since motivation is not measured, it constitutes a source of omitted variable bias.

Randomization solves this problem by making sure that, on average, motivation (plus all other possible confounding variables) is equalized between T and C.


Small Groups:


Now let’s return to the task of estimating the causal effect of (contextual) diversity. Think about Enos’ experiment.

  • How did this experiment work?
  • What two groups did he compare?
  • How were people allocated to T and C?
  • In what ways does this allocation resemble (or not) random draws from a box? In other words, can we use the average observed control outcomes as a counterfactual for the average potential control outcomes?
  • Another way of thinking about the problem: In what ways were T and C different? In what ways were they the same?


Experimenting with Repeated Contact

Suppose your friend sees Enos’ results and says:

“Nice experiment, but he’s just documenting a temporary reaction to the unexpected appearance of Latinos in all-white suburbs. Over time, however, people are going to become more comfortable with diversity. For example, cities have historically been magnets for immigration, and the people living there seem to have no problem with diversity.”

Results: